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(54) Abstract Title 
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(57) A system for conducting a virtual audio-visual conference between two or more users 202 comprises two 
or more client stations 204 each acting as a signal source and destination for each respective user, having a 
user interface (500, fig.5 not shown) for audio-visual input and output, one or more servers 206 and a network 
210 coupling the client stations and the servers. Each user is represented as a corresponding movable visual 
symbol displayed on the user interfaces of all coupled client stations. The audio signal of all users is generated 
at each client station with an attenuation according to the spatial position and possibly the orientation 
direction of the respective symbols on the user interfaces. By using the Interface, a user may move his 
corresponding symbol about or out of a virtual meeting room or close to some other symbols for 
eavesdropping and also mute or control the sound output or volume of other participating users. 
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At least one drawing originally filed was informal and the print reproduced here is taken from a later filed formal copy. 
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VIRTUA^EETING ROOMS WITH SPATlU AUDIO 

FIELD OF THE INVENTION 

This invention relates generally to the field of remote audio-visual 
conferencing and more specifically to a method and system for conducting virtual 
conferences with spatial audio. 

BACKGROUND OF THE TNVFNTTclN 

Telephony conference calls are well known in the art. The most common type 
of conference call involves two or more users connected over a telephone line 
carrying on a multi-person conversation. Such conference calls are audio only with 
no visual representations. Algorithms such as loudest caller (D.L. Gibson et al., 
"Unattended Audioconferencing", BT Technology Journal, vol. 14, no. 4, Oct. 1997) 
are used to generate audio, but unfortunately do not provide naturalistic 
representations of the speakers' voices. 

There is also known in the art conferencing applications that provide a limited 
visual representation of the conference. In one form of conferencing application, a 
simple list of the participants is displayed. The information provided to a participant 
is limited to merely the state of the conference call. Also, in the prior art, IBM has 
disclosed a conferencing application, known as IBM Java Phone 
(http://www.haifa.il.ibm.com/javbro_new2.html) which provides a limited visual 
representation of a conference. However, all of the above conferencing applications 
suffer from a lack of realistic sound reproduction because they do not consider a 
spatial or directional relationship between the participants. Furthermore, they fail to 



provide a sense of^^sence" or to consider the relative positi^^f the participants. 
They also do not provide a visual indication of which participants are currently online 
before the conference call is initiated. In these prior art systems, the initiator of a 
conference call must "set up M the conference call which includes explicitly specifying, 
locating and contacting prospective participants beforehand and then joining them to 
the conference call. 

The use of the computer networks such as the internet for conferencing is also 
known in the art. Personal computer based internet telephony applications such as 
Microsoft Netmeeting provide both an audio and visual component to conferencing. 
However, products such as Microsoft Netmeeting still suffer from the drawback that 

the initiator must still contact each participant ahead of time using a regular phone to 

r 

ensure that all parties are at their desks and willing to participate in the conference 
call. Such products still suffer from poor audio and visual quality and limited 
conference control. 

A prior art alternative to conference calls where the call must be previously 
arranged is the computer chat room. A multi-user computer chat room is a virtual 
meeting place commonly experienced by users of both the internet and intranets 
providing a means for establishing and maintaining formal contacts and collaboration. 
In a chat room, people assume virtual identities, which are generally known as 
avatars. Chat rooms can be connected to other such rooms allowing people to move 
from room to room, participating in different conversations. Any person in a room 
can talk to another person in the same room and conversations among users do not 
need to be announced although public and private conversations are allowed. One 
particular standard for the implementation of chat rooms is Internet Relay Chat (IRC), 



the technical det; 



which are disclosed at 



http://www.irchelp.org/irchelp/ircprimer.html. In the evolution of the technology, the 
prior art has developed three-dimensional multi-user rooms in which participants are 
represented by realistic renderings of people. Up until recently, communication in 
these virtual worlds has been limited to text. 

The current standard for three-dimensional virtual meeting places, VRML 
(Virtual Reality Markup language), has evolved to include sound sources as is 
described in VRML 2.0 (http://vrml.sgi.com/moving-worids). San Diego Center's 
VRML Repository at http://sdsc.edu/vnnl/ also has provided examples of the use of 
chat rooms and the VRML standard. One of the major difficulties with the inclusion 
of sound is delivering a realistic continuous sound signal to the participants. The 
sound signal should sound "live", rather than delayed or pre-recorded to facilitate 
interactive communication. The sound of prior art systems and methods is typically 
of poor quality and unrealistic. A farther problem is that there is very little correlation 
between the visual representation and the audio presentation. The prior art chat rooms 
and virtual meeting place systems suffer from the same problems discussed above for 
audio conferences, in that they do not provide realistic sound replication and do not 
consider the visual position of the speaker relative to the listener when rendering the 



audio. 



No work had been performed on combining the technology of virtual meeting 
places with audio which presents sound from all sound sources in their spatial 



configuration with respect to each participant. 
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SUMMARY OF THE INVENTION 

The present invention provides a system and method in which users can set up 
voice conferences through a visual representation of a meeting room. The inventive 
5 system and method provides both a visual sense of presence as well as a spatial sense 
of presence. One feature of a visual sense of presence is that the participant is 
provided with visual feedback on the participants in the conference. One feature of a 
spatial sense of presence is that a conference does not need to be prearranged. A 
further feature of the spatial sense of presence is that a person can be located by 

10 sound. The audio stream emanating from the speaker is attenuated to reflect the 
spatial distance between the speaker and the listener and also contains a directional 
component that adjusts for the direction between the speaker and the listener. In the 
inventive system and method, users can engage in a voice interaction with other users 
which are represented on the user interface through visual representations, symbols or 

1 5 avatars. The model of interaction (sometimes known as the "cocktail party" model) 
provides navigational cues through pieces of conversations close in virtual space that 
can be eavesdropped. As a participant moves through a virtual meeting place, he or 
she can "browse" conversations and participate in those of interest. Each participant 
receives a different sound mix as computed for the position of his or her avatar in 

20 virtual space with respect to the others. Thus, audio is presented to each participant 
that represents the sound generated from all sources in their spatial relationship with 
respect to each participant. 

Avatars can join a conversation (and leave another) by moving the avatar from 
the current group to another through virtual space. 



In one aspefl^r the present invention there is provided l^stem for 
conducting a virtual audio-visual conference between two or more users comprising: 

a) two or more client stations each acting as a signal source and 
destination for each respective user, having a user interface for audio- 
visual input and output including audio signal reception and generation 
means for receiving and generating audio signals; 

b) one or more servers; and 

c) a network coupling said client stations and said servers; 

wherein each user is represented as a corresponding movable visual 
symbol displayed on the user interfaces of all coupled client stations 
and the audio signal of all the users is generated at each client station 
attenuated according to the spatial position of respective symbols on 
the user interfaces. 

In another aspect of the present invention there is provided a method for 
generating a spatial audio signal in a virtual conference presented on an audio-visual 
device comprising the steps of : a) locating the position of a sound generating 
participant in a virtual conference; b) locating the position of a listening participant in 
the virtual conference; c) calculating the signal strength of the signal received from 
the generating participant at the position of the listening participant based upon the 
distance between the sound generating participant and the listening participant; and 
d) generating an output signal corresponding to the calculated signal strength. 



Figure 1 is a representative overview diagram of a virtual world of the present 
invention. 

Figure 2 is a representative block diagram of a communication system for 
implementing the virtual world of the present invention with spatial audio. 

Figure 3 is a representation of a contour plot using a uniform model of sound 
distribution with one person in a virtual meeting room. 

Figure 4 is a representation of a contour plot using a unifonp model of sound 
distribution with three people in a virtual meeting room. 

Figure 5 is a representation of a user interface depicting a virtual meeting 

room. 

Figure 6 is a software architecture for implementing the present invention. 

Figure 7 is a representation of sound distribution using a directional model for 
one person in a meeting room. 

Figure 8 is a representation of sound distribution for one person where the 
angle of direction of the sound is illustrated. 

Figure 9 is a representation of directional sound distribution illustrating two 
participants. 

Figure 1 OA is a representation of directional sound distribution illustrating 
eavesdropping by a third participant. 

Figure 1 OB is a representation illustrating the attenuation at point b with 
regard to a sound source at point a. 

Figure 1 1 is a representation of an alternate embodiment of the present 
invention where multiple rooms on the floor of a virtual building are illustrated. 
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Figure 12 is representation of an alternate embodimentui the present 
invention where a side-bar conversion is shown. 

Figure 13 is a graphical representation of an alternate embodiment illustrating 
where the real-time distance range from sound source is divided into intervals. 
5 DESCRIPTION OF THE PREFERRED EMBODIMENT 

Turning to Figure 1, a virtual world 100 may depict some imaginary place or 
model of an aspect of the real world. The virtual world 100 has a number of meeting 
places where participants can interact. In a preferred embodiment, a meeting place 
consists of a number of connected rooms 102 which may themselves be part of virtual 
10 building 104. The buildings can have a number of floors 106 and movement through 
the building can be facilitated by an elevator 108. The rooms 102 are connected by 
doors 1 10 and 1 12. Open doors 1 12 indicate that the voices from one room can be 
heard in neighboring rooms. People interacting in the virtual world 100 are 
represented by symbols or avatars 1 14 and can move around the virtual world 100. 
1 5 Groups of people in a room 1 02 can have conversations. 

Overlapping boundaries between conversations enables eavesdropping from 
one conversation to another with the intensity of the sound emanating from a 
conversation dropping off with the distance from the other participants as described 
with respect to the Figures below. 
20 An avatar 1 14 can join or leave a conversation as the participant 

changes the location of the avatar 1 14 in the virtual meeting room 102. 
Eavesdropping occurs when a participant represented by avatar 1 14 listens to a 
conversation different from the one in which it is currently engaged. Also, a 
participant represented by an avatar would also be eavesdropping where it does not 
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take part in any ^P^ersation. Joining or leaving a convers^^is achieved by 
moving the avatar 1 14 from one participant or group of participants represented by 
avatars 1 14 to another through the virtual world 100. In addition, eavesdropping can 
be restricted to specific participants in order to support sidebar conversations or a 
5 "cone of silence"(conversations restricted to only a specific subset of participants 
represented). This is described in further detail with respect to Figure 12. 

Turning to Figure 2, a communication system 200 embodying the present 
invention is shown. The example shown in Figure 2 is a client server architecture, 
although the invention can easily be modified to operate on a single stand-alone 
1 0 machine using a graphical terminals or interface. Users 202 interface with client 

stations 204 to participate and communicate with other users 202. Client stations 204 
are communications devices or personal computers such as are well known in the art 
with graphical user interfaces and may include a keyboard, a pointing device such as a 
mouse, or joystick, an audio system with microphone and speakers or headphone. In 
15 a preferred embodiment, client stations 204 are personal computers running an 
operating system such as Windows 95 from Microsoft although other operating 
systems and graphical use interfaces such as are well known in the art could be used. 
In the preferred embodiment, client stations 204 connect to servers 206 through local 
area networks 208. Servers 206 can be any appropriate commercially available 
20 software and hardware devices such as are well known in the art. In a preferred 
embodiment, server 206 is an Intel processor based network server from Compaq 
Computers running the Windows NT operating system from Microsoft. The local 
area networks 208 can be based on Ethernet or any other commercially available local 
area network. Local area networks 208 can be interconnected through a wide area 
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communication syWm 210 which may also be an ATM networJcor a network of any 
other type that allows for client stations 204 to connect to server 208. Servers 208 are 
also optionally connected to peripheral devices such as printers and may have 
connections to other systems and devices, including both voice systems and data 
systems, in the outside world. The method and system of the present invention is 
typically implemented using software running on client stations 204 and servers 206. 

Turning to Figure 3, an illustration of sound intensities assuming a uniform 
distribution of sound emanating from an avatar 302 in a meeting room 300 is shown. 
An x-y grid can be superimposed on the meeting room to identify each point in the 
room. A formula to compute the intensity of sound distribution of a point source at a 
point (x,y) of the signal of a sound source located at (Xo,y 0 ), (assuming A is the initial 
intensity at which the source is generating sound signals and k determines how fast 
the intensity decays) can be approximated by an inverse square function: 



Intensity and A may be measured in any appropriate units, such as decibels. 

Figure 3 shows a contour plot of such a sound distribution where X = 0.05. In 
Figure 3 the sound source (avatar 302) is located at point (5,5) in virtual room 300 
with dimensions of 20x10 units, and generates sound signals with an initial intensity 
A equal to 3. In Figure 3, the white area on the plot corresponds to highest intensity, 
and as the grey level darkens, the intensity drops to 0.0. 

Turning to Figure 4, an illustration of a more complex room 400 containing 
three avatars 402, 404 and 406 is shown. Avatars 402, 404 and 406 are illustrated 
with locations as indicated in Table 1 . This scenario illustrates a typical meeting room 
with a three avatars grouped around a set of tables 401 . 



Kx, y) = 



A 



Mix - xo) 2 + (y - yo) 2 ) + 1 



TABLE 1 





Location fa, y 0 ) 


Intensity A 


Avatar 402 


(15,8) 


1.0 


Avatar 404 


(15,2) 


2.0 


Avatar 406 


(5.5) 


3.0 



In the example of Figure 4, avatar 402 generates a signal with intensity 1.0, 
avatar 404 generates a signal with intensity 2.0, and avatar 406 generates a signal with 
5 intensity 3.0. 

The total intensities and the contributions from each individual avatar 402, 404 
and 406 at each location is shown in Table 2. Each avatar 402, 402, 406 hears the 
sound contributions of the other avatars. The contribution of each avatar is calculated 
using the formula described with respect to Figure 3 where the point (x, y) is the 

1 0 position of the avatar hearing the sound, and the point fo, y 0 ) and A are the location 
and intensity respectively, of the avatar generating the sound. The total intensity is 
the sum of the contributions of each avatar. The total intensity at any point represents 
the sound that would be heard by the avatar at that point, and would be the audio 
output through the speaker or headset of the participant represented by that avatar at 

1 5 the participant's client station. 

In Figure 4, using the formula previously described with respect to Figure 3, 
the sound intensity or spatial audio for the entire virtual room can be calculated. For 
example, the intensity around the point (10,5), is 2.4. Towards the middle side of the 
room, at location at point (10,2) it is 2.2. And in the left lower corner, at location 

20 (0.0), the intensity is at 1 . 1 . 
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TABLE 2 



Avatar 1 (402) 
Avatar 2 (404) 
Avatar 3 (406) 



Total 
intensity 

2.1794 

2.82226 

3.46512 



Contributed from Contributed from Contributed from 
avatar 1 (402) avatar 2 (404) avatar 3 (406) 



1.0 

0.357143 
0.155039 



0.714286 
2.0 

0.310078 



0.465116 
0.465116 
3.0 



Turning to Figure 5, an example of a user interface 500 of a client station 204 
5 of Figure 2 is shown. As discussed with respect to client station 204 of Figure 2, each 
user interface 500, includes a screen, and input/outputs means such as a mouse, 
keyboard, CPU and audio input/output device such as speakers and a microphone. 
The user interface 500 could operate on any typical graphical computing environment, 
such as Windows 95, X Windows or graphical terminal. The user interface 500 could 

1 0 be programmed in software in any suitable, well known computing language for 

execution on client station 204 of Figure 2. The "Meeting Room" window 502 shows 
the location of the facilities (tables 504, doors 508, etc.) in the meeting room and the 
representations of the participants in the meeting room (avatars 508, 510, and 512). 
The window title 514 also indicates the name of the room. Participants are identified 

15 by a participant identifier 5 1 6, such as a number that appears the list in the "Meeting 
Room Inspector" window 518. Alternatively, photographs of the participants, or the 
names of the participants if the space on the window allows, could be used to 
represent the participants. 

Each participant can move in virtual space by repositioning its avatar 508, 510, 

20 512 with the pointing device. The participant might also change the orientation of its 
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Avatar 508, 510, 5^^f instead of the point source model of sound, a directional 
sound model is employed as further described with respect to Figures 7 to 10. 

The "Meeting Room Inspector" window 518 provides the means to view the 
progress of the conference. The window 518 presents a list of the names of the 
5 current participants and matches them up with the participant identifier 516 used in 
"Meeting Room" window 502. It can also provide settings control such as mute 
control 520 for adjusting the environment such as muting a participant. Through the 
mute control 520, a user can instruct the system not to output audio from a 
participant, although the participant's Avatar might be within audible distance. This 

1 0 control feature can be used when the participant at user interface 500 does not want to 
listen to another participant (for example — the other participant is noisy, makes 
obscene remarks etc.). 

Similarly, the participant at user interface 500, which would be represented by 
a participant identifier 516 in meeting room inspector window 518 may also wish that 

1 5 all other participants not hear what is going on locally. By selecting mute control 520 
corresponding to the participant identifier 516 for the participant at user interface 500, 
that participant can prevent local audio from going to the other participants, thereby 
performing a form of call screening. 

In an alternate embodiment, not shown, a similar control window to the 

20 meeting room inspector window could be used to selectively choose which 

participants can hear regular audio. By selecting the appropriate settings, a participant 
can tell the system which other participants are to hear the audio. This is a way of 
implementing a sidebar conversation as described in further detail with respect to 
Figure 12. Finally, the user interface 500 has a volume control window 522 by which 
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the user can modif^^ intensity of its signal, for example, to aSmpensate weak line 
transmission. . i,,:..,-., : 

Turning to Figure 6, an example of a software architecture 600 for message 
flow between the components of the communication system 200 of Figure 2 of the 
5 present invention is shown. 

The architecture 600 shows a configuration with three participants, A,B, and C 
where client subsystems 602 and 604 for participants A and C only are shown in full. 
Client subsystems 602 and 604 are run on the client stations 204 of Figure 2 with each 
participant represented as an avatar on the user interface of each client station. Each 
1 0 participant has a corresponding client subsystem (602, 604) within its client station 
which consists of a source 606 and 608 and a mixer 610 and 612 respectively. The 
source 606,608 is a software module that receives audio input from a microphone by 
calling the sound card driver API on the client station. The source 606, 608 receives 
the audio input from the participant and generates a stream of audio updates together 
15 with information on the current location of the participants. The mixer 610, 612 is a 
software module that receives audio streams and location information from the other 
client subsystems and integrates and synchronizes the audio streams as described 
below. 

Client subsystems 602 and.604, of which the mixers 610, 612 are a part, do 
20 not interact with each other directly but send their updates to a world server 614 
which then dispatches them to the appropriate client subsystems 602 and 604. The 
world server 614 is typically run as a software module on a server 208 of Figure 2. In 
addition to providing audio services, world server also provides the necessary 
communications management of the graphics signals in a manner such as is well 



respect to Figure 5 Communication is facilitated by packets passed between client 
subsystem 602,604 and world server 614. Each client subsystem 602, 604 is 
represented by its own thread (reflector) in the world server 614 that handles updates 
5 from its client subsystem and forwards updates to the other reflectors 61 6, 6 1 8 and 
620 in the world server 6 1 4. For each client there is a corresponding reflector 6 1 6, 
618 and 620 in world server 614. 

In an alternate embodiment, (not shown) the world server could be separated 
from the system or server providing the graphical representation of the virtual world. 
10 In this manner, the present invention can used to extend a prior art virtual world, such 
as VRML with the world server 614 of the present invention dedicated to carrying the 
voice traffic between the participants. This significantly enhances the performance of 
existing systems, which are based on sharing the same LAN or Internet for data and 
voice traffic. 

1 5 An example of the typical message flow between client subsystems 602, 604 

and world server 614 can be illustrated as follows: 

1- Client subsystem 602 (A) updates its input audio stream 622 and sends a 
packet to the world server 614 together with the location of the participant. 

2. Reflector 616 (A) receives the update packet and forwards it to all other 
20 reflectors, namely reflector 61 8 (B) and reflector 620 (C). 

3. Reflector 620 (C) sends a request 626 to mixer 612 (C) to mix in the update 
packet into its output audio stream. Mixer 612 (C) synchronizes the audio 
packet with the other audio packets it has received but not yet played and adds 
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# • 

snfams locally. Reflector 61 8 (B) similarly reque 



the audio stWams locally. Reflector 61 8 (B) similarly requests Mixer B (not 

shown) to mix in the update and Mixer B acts on it. 

The software architecture 600 illustrated above is only one preferred 
embodiment where the invention jnay be deployed in which audio processing is 
5 distributed among clients. Alternative embodiments, not shown, are possible where 
all software modules, except for the client display and client-side audio streaming, but 
including audio attenuation and mixing for each client, could run on a central 
multipoint control unit (MCU) on a server of Figure 2. The choice whether to 
centralize or distribute the processing is based simply on practical considerations such 
10 as the processing power required for real-time audio processing, communications 

speed, etc., and does not affect the essence of the invention. For example, an alternate 
embodiment of the invention in an architecture using the H.323 standard for IP-based 
audio and video communication services in local area networks as described at 
rhttp://gw.databeam.com/h323/h323primer.html] could be used. In this alternate 
15 embodiment, the present invention is deployed using a multipoint control unit (MCU) 
that supports conferences between three or more endpoints. The MCU consists of a 
Multipoint Controller (MC) and several Multipoint Processors (MP) deployed 
similarly to client stations. All endpoints send audio streams to the MCU in a peer-to- 
peer fashion. The MP performs the mixing of the audio streams anil send the resulting 
20 streams back to the participating terminals or client stations. 

In an alternate embodiment, to reduce the load on the network, the world 
server 614 will actually choose not to forward an audio packet if the participants are 
too far apart or they are not in sight-line (as shown on the user interface) and to 
aggregate audio packets from nearby participants when forwarding an audio packet to 



"remote" participant Participants can also be excluded from a conversation to create 
a sidebar conversation or cone of silence as described in further detail with respect to 
Figure 12. 

This alternate embodiment is an optimization to reduce the amount of packets 
5 sent across the network. Given the way by which the world server 61 4 can determine 
the distance between two avatars and that the corresponding sound attenuation is 
below some threshold, the world server 614 can chose not to forward an audio packet. 
Similarly, it can suppress an audio packet, if there is an obstacle shown on the user 
interface (such as a wall) between the avatars that would prevent the propagation of 

1 0 sound between them. 

Returning to Figure 6, the synchronization technique used to align the audio 
packets arriving at mixers 610 and 612 (originating from different instances of the 
same message flow) is based on standard techniques for compressing and expanding 
audio packets, for example, as described in U.S. Patent 5,784,568. Each audio packet 

15 contains an identifier and a sequence number. The identifier uniquely identifies its 
source and the sequence number allows the mixers 610 and 612 to drop and/or 
interpolate between packets. 

The mixers 610 and 612 use the location information in each of the update 
message packets to determine the audio signal to be delivered to each participant. 

20 Using the computation procedure described with respect to Figures 3 and 4 for a 
uniform distribution, or figures 8 to 1 1 for a directional distribution, the mixers 610 
and 612 calculate and determine the signal strength by attenuation of the audio signal 
to simulate the drop in intensity. In a preferred embodiment, all computation is done 
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locally at the mixers^^and 612 to minimize the computationafRd of the world 
server 614. 

An example of the attenuation of the signal strength is described below. The 
procedure can easily, with obvious modifications, be applied to a directional 
5 distribution sound model. If the location information for the sending Source S as 
indicated in the update message is (x s , y s ) and the current location of receiving Source 
R is (x R , ya), the audio signal is attenuated by the following factor A: 

A(x R ,y ll )=I(x M y II )/A 5 = 1 

A ((x R -xs) z + (yR-ys)')+l 
10 using the formula for the intensity of a sound source described with respect to Figures 

3 and 4. In the formula for the intensity we need to substitute (xO,yO) by (x s ,y s ) and A 
by A s . 

Turning to Figure 7, an alternate embodiment of the present invention 
illustrating a directional sound source js_ shown. The implementation of a directional 

1 5 sound source as an alternative to the jmiform model of Figures 3 and 4 for calculation 
of the sound intensity provides improvements in quality and realism. As previously 
described, the examples of Figures 3 and 4 use a uniform model of sound propagation, 
that is, the sound intensity drops off as the radial distance from the participant 
increases. A more realistic model is to model participants as directional sound 

20 sources. ■■— 

In a directional sound source model, the range of the sound emitted by each 
participant can be approximated by an ellipse. As shown in Figure 7, the ellipse 702 
is defined by the origin of the sound source 704 (point A), which coincides with a 
focus of ellipse 702, the forward range 706 (max A ) from the sound source, and the 

25 backward range 708 (min A ), and its orientation in space, that is, the directionality of 
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the sound, as indiSfied by the unit vector 710 (u A ). The souncfintensity drops 
proportionally to the square of the real-time distance (that is, distance normalized to a 
value between 0 and 1) from the sound source. Mathematically, the intensity never 
actually drops to 0. However, at some distance the intensity will drop below the 
5 audibility threshold. We thus select the decay factor X such that the attenuation at the 
boundary of the ellipse will bring the intensity below a user-defined audibility 
threshold. This threshold may be a parameter that the user or administrator can set 
through the graphical user interface to calibrate the system. We can select a value of 

X such that at the boundary of the ellipse, the intensity will be ^ * of the initial 

10 intensity. This is described in further detail with respect to Figures 10A and 10B. 

Turning to Figure 8, an example illustrating the angle of directionality of 
sound from a participant is shown. 

The participant A is represented by avatar 802 on a graphical display device. 
The orientation of the avatar 802 can be defined by a unit vector u A rooted in a focus 

15 of an ellipse 804 that describes the sound distribution superimposed over avatar 802. 
The focus of ellipse 804 coincides with the origin of the sound source, avatar 802. An 
(x,y) coordinate system can be superimposed at the origin of the sound source, avatar 
802. The unit vector u A forms an angle <|> with the vector (-1,0) of the (x,y) coordinate 
system, as shown in Figure 8. A participant A, through the graphical user interface, 

20 can adjust the orientation of the avatar using the pointing device used for moving the 
avatar 802 in virtual space or through a physical input device. There are various ways 
a participant A could specify the angle <j>, for example, by rotating a dial on the screen 
with the mouse, or by turning a dial on a physical input device. 
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9, an example illustrating the directionamy of sound from 

two participants is shown. 

Participants can only hear each other when they are in each other's range. In 
Figure 9, participant A is represented by avatar 902 on a graphical user interface, 

5 which has a directional sound distribution represented by ellipse 904. Likewise, 
participant B is represented by avatar 906 which has a directional sound distribution 
represented by ellipse 908. The determination of whether participant B can hear 
participant A is whether avatar 906 of participant B is inside ellipse 904 describing 
participant A's sound distribution. As can be seen from Figure 9, avatar 906 of 

10 participant B is inside ellipse 904 of participant A, therefore, participant B can hear 
the sound emanating from participant A. In contrast, avatar 902 of participant A is 
not within the ellipse 908 of participant B, therefore, participant A cannot hear sound 
emanating from participant B. 

Eavesdropping on conversations can be defined in terms of a table. Table 3 

1 5 illustrates when a third participant would be able to eavesdrop on the conversation 
between two other participants. A participant represented by an avatar is said to be 
able to "eavesdrop" into another conversation if it is located sufficiently "close" the 
avatars representing the parties involved in the conversation. 





i A =o 


I A >0 


i B =o 


NO 


NO 


i B >o 


NO 


YES 



20 Table 3 



Table 3 indicates that in order for a third participant to eavesdrop on the 
conversation between two other participants, A and B, the intensities I A and I B , as 
measured at the location of the third participant, must both be greater than 0. Another 
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way of stating this^fcat third must be in the intersection of th^fcptical sound 
distributions for A and B. Assuming that the intensity is set to 0 outside of the ellipse 
for computational efficiency. Turning to Figure 10A, the eavesdropping of a third 
participant on the conversation of two other participants is illustrated using a 

5 directional sound distribution. In Figure 10A, participant A is represented by avatar 
1002 on a graphical user interface, which has a directional sound distribution 
represented by ellipse 1004. Likewise, participant B is represented by avatar 1006 
which has a directional sound distribution represented by ellipse 1008. A third 
participant C which wishes to eavesdrop, represented by avatar 1010 is shown in four 

10 positions: C, C\ C" and C" respectively. With avatar 1010 at position C", neither 
participant A nor participant B are audible to particpant C. At position C", as avatar 
1010 approaches avatar 1002, participant A becomes audible, but not participant B. 
With avatar 1010 at position C, participant B becomes audible, but not participant A. 
With avatar 1010 at position C, both participant A and participant B (i.e. the 

1 5 conversation) become audible as avatar 1 0 1 0 is in the boundary defined by the 
intersection of the two sound distributions ellipses 1004 and 1008. 

This can also be represented by a table. Table 4 below, which is similar to 
Table 3, illustrates how sound can provide a navigational cue. 





Ia=0 


I A >0 


I B =0 


C" 


C" 


i B >o 


c 


c 



20 Table 4 

Tables 3 and 4 can be generalized to multiple avatars in an obvious manner. 
The intensity of sound experienced at a position B relative to sound source at 
position A for a directional sound model can be determined numerically. A's sound 
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m • 

isWCa at point b is defined by the origin of the sou 



distribution as meastlla at point b is defined by the origin of the sound source a and 
parameters u A , max A , min A and N A , as discussed above with respect to Figures 7 to 9. 
This approximation assumes that the attenuation factor N A has been chosen such that 
the sound intensity from a participant at location A is above an audibility threshold. 
5 We can select a value of X such that at the boundary of the ellipse, the intensity will 

be of the initial intensity. To simplify the calculations, we can set the sound 

intensity from a sound source A to zero outside A's ellipse. This is a practical 
assumption that reduces the computational effort required for computing the various 
sound distributions. 

1 0 The formula for the attenuation at a point b with regard to a sound 

source in a is: 
A(x.,y.) = 



(b - a) (b = a) 
r(max A , min A , 7t - a>) 2 
where 



1 + (N A — 1) 



15 o) = arccos- 



(b-a)u A 



y}(b-a)(b-a) 
and 

2 maxmin 

r(rnax,min,0) = 



max+ min+ (max- min) cos^ 
When point B is at the periphery of the ellipse, we get - according to the definition of 
20 real-time distance: " " 

A(x fi , y B ) = 1 

1+(N-1)(1) 

The intensity is simply the product of the base intensity of the sound source and the 
25 attenuation at the point for which the intensity is computed. 



The attenuation can be illustrated by example as shown in Figure 10B. Figure 
10B shows a graphical representation 1050 with a sound source A (1052) and B 
(1054). Assume sound source A (1052) is located at a = (-2,2), with a base intensity 
5 of 3 and has a forward range of 20, a backward range of 1 , and an orientation of 60° 
(degrees) or n/3 (radians) from the -x axis and decay factor N = 5. Sound source B 
(1 054) is located at b = (-4,8), and has a forward range of 10, a backward range of 5, 
and an orientation of 270° (degrees) or 3rc/2 (radians). 



The unit vector u for the directional sound source at A (1052) is given by: 



10 




Continuing the example, we can calculate the common terms as set out below. 



b- a = (-4-(-2), (8-2) = {-2, 6} 
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(b - a) . u = {-2, 6} . { , — } = 1 + 3 V3 = 6.19615 



(b - a) . (b - a) = {-2, 6}. {-2, 6} = (-2)(-2) + (6)(6) = 40 



Further continuing the example, we can calculate a> as set out below. 



20 



First we compute the angle co between b-a and u. This angle is then used as input to 
the formula for r, the real-time distance between A and B. 



The cosine of co becomes: 



25 



cos( 0)) = 



V(b - eft . (b - at) 



(b - a) . u 



= 1+3 VI = 1+3 V3 = 0.979698 
V40 2>/l0 



Thus we obtain co . 



30 
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co = ArcCo^T.979698] = 0.20 1 848 
From the above, we can perform the calculation of r(max, min,^) 
where <p = it - co . 

p = * - a> = 3.141593 -.201848 = 2.93974 
5 Continuing the example where max = 20 and min = 1 , plugging into the formula for r, 
we obtain: 



max + min + (max-min) Cos[ q> ] 20 + 1 + (20-1) cos (2.93974) 

10 Alternatively, from geometry we know that cos(;r -co) - cos co. Although, above we 
computed the value of co for clarity, in fact, to reduce the calculations, we only need 
to compute the cos co , and can avoid recomputing the cosine of n- co in the formula 
for r. We thus could have computed r more simply as follows: 

15 2 max min = 2(20)(1) =16,7663 

max + min - (max-min) Cos[ co ] 20 + 1 - (20-1) cos (0.201848) 

Calculation of the Attenuation at Point B 

The sound intensity drops proportionally to the square of the real-time distance from 
20 the sound source. Since, mathematically, the intensity never actually drops to 0, we 
select the decay factor X such that the attenuation at the boundary of ellipse will be 1 
N-th of initial intensity. N should be chosen such that for attenuations larger than an 
N-foId reduction the sound is below the audibility threshold. This threshold may be a 
parameter that the user or an administrator can set through graphical user interface 
25 during a calibration phase. 



2 max min 



2(20)(1) 



= 16.7663 



The formula, as previously discussed, for computing the attenuation at point B is: 
A(x M y B ) = 
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If we choose N = 5, plugging in the intermediate results from above, we have an 
attenuation A (x B , y B ) of: 

1 = .637277 

5 l+(5-l)(40)/(16.7663) 2 

Calculation of the Sound Intensity at Point B 

Assuming a base intensity at point A of 3, the sound intensity at I (x B , y B ) point B is: 
(base intensity of A) * (attenuation at point B) 
1 0 I(x., y.) = A * A(x„ y B ) = 3*.637277 = 1.91 183 

Where there are multiple sound sources, then the total intensity at any point is merely 
the sum of the sound intensities from each source, a similar and obvious adaptation of 
the procedure described with respect to Table 2 and the calculation example above. 

15 

EXTENSIONS 

The invention is not limited to a single virtual room, but applies similarly to 
several floors with connected rooms. However, some modifications to the way sound 
propagation is computed would be appropriate in this case in order to make the 
20 computation more efficient. In this scheme, a room can be treated as a single sound 
source to locations outside the room. That is, the new sound source is not used for 
sound propagation computations inside the room. 

Figure 1 1 shows one floor 1 100 of a virtual building with rooms 1 102, 1 104, 
1106 and 1108 that are connected through doors 1110, 1112and 1114 respectively. A 
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• # 

4^ff06) can be connected to several other rooms a 



room (1 102, 1 104^ff06) can be connected to several other rooms at the same time, 
such as room 1 108, which is the virtual equivalent of a shared hallway. 

Each room 1 102, 1 104, 1 106 and 1 108 is represented by an equivalent 
sound source that has an initial intensity A equal to the intensity that would be 
5 experienced by an avatar located in the center of the door to the room as indicated by 
the points 1116, 1118 and 1 120 respectively. If a room has multiple doors, such as 
room 1 108, it is represented by as many equivalent sound sources such as points 
1 1 16, 1 1 18 and 1 120. This simplification is reasonable since the sound does not 
propagate through the door in the same manner as in free space inside the room. At 
10 the same time, this provides a better approximation of the sound distribution in a 
physical building than that obtained by assuming that the sound does not propagate 
beyond the doors of a room. In this manner, an avatar can move throughout virtual 
rooms, floors and buildings and eavesdrop and participate in numerous conversations 
of interest. 

1 5 Turning to Figure 12, an alternate embodiment of the present invention where 

a side-bar conversation held within a "cone of silence" is shown. Avatars 1204 and 
1206 are present in meeting room 1202 with the participants represented by Avatars 
1206 engaged in a private side-bar conversation, shown as within cone-of-silence 
1208. The participants represented by Avatars 1204 are not participants in the side 

20 bar conversation, and are shown outside cone of silence 1 208. 

The participants represented by Avatars 1204 excluded from the sidebar conversation 
will only hear a strongly attenuated version of the sound of the sidebar conversation 
such that the sound generated is just above a level of being audible. This gives the 
participants corresponding to Avatars 1204 the sense that there is a conversation 
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icipants represent by Avatars 1206, but doe 



between the sidebai^articipants represent by Avatars 1206, but does not allow them 
to eavesdrop on it. The method and ? for dismissing the sound generated for the 
participants represented by avatars 1206 would be as previously described with 
respect to figures 1-10. 
5 The participants represented by Avatars 1 204 can be included in a sidebar 

conversation by selecting them in the graphical representation of the virtual meeting 
room 1202. Any single participant can start a side-bar conversation. Mechanisms, 
using an appropriate check box window, similar to the meeting room inspection 
window 518 of Figure 5 may be put in place to allow only current participants in a 

1 0 sidebar conversation to add new participants. 

Turning to Figure 13 , an alternate embodiment of the present invention is 
shown in graph 1302 where the real-time distance is divided into intervals. This can 
be used to simplify the calculations where calculation efficiency is important by 
dividing the real-time distance range into a number of intervals and computing only 

1 5 one attenuation value per interval as shown in graph 1 302 of Figure 13. Graph 1 302 
shows the original attenuation function 1304 and a stepped attenuation function 1306. 
The value calculated for the interval is then the attenuation applied to all locations 
whose distance from the sound source falls within that interval. 

One can take advantage of the division into intervals by selecting the intervals 

20 such that subsequent intervals are mapped to half the attenuation of the previous 
interval. This simplifies the computation of the attenuated sound, since now a 
floating-point division can be replaced by a shift right by one. One can easily see that 
the upper bound of the n-th interval can be computed by the following formula: 
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r„=V(2"-l)7^-l) 




For example, as shown in the graph 1302 of Figure 13, assume we want to 
divide the real-time distance into three intervals, first interval 1308 which goes from 
0.0 to rl, second interval 1310 which goes from rl to r2, and third interval r3 which 
5 goes from r2 to 1 .0, and the decay factor N = 5. From the formula above, we obtain 
the interval values: 



With centralized mixing in an MCU, this could be employed to further 
advantage as the same attenuated audio packet can be sent to all participants whose 
distance from the sound source falls within the same interval. If, for example, as in the 
graph of Figure 13, we divide the real-time distance range into three intervals of 
1 5 attenuation 1 , V 2 and we need to attenuate an audio packet at most three times, not 
individually for each participant, no matter how many participants there are. This 
alternate embodiment reduces the computation necessary where the computation is 
performed centrally in an MCU and delivered to the user interfaces of the various 
participants. 

20 In a further embodiment of the invention, several different locations associated 

with one user can be represented as virtual meeting rooms. These can include the 
user's desktop at work, the desktop at home, the hotel room in which the user is 
staying, etc. This allows the user to define at which default locations it wants to be 



First Interval 1308: from 0 to rl = J(2 ] -l)/(5-l) = 0.5 



Second Interval 1310: from rl = 0.5 to r2 = J(2 2 -l)/(5-l) = 0.866 



Third Interval 1312: from r2 = 0.866 to 1 



10 
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located and contacSd for conversation. In this manner, avatars can be used as 
presence indicators that show the availability of people in a virtual community. 

In a further embodiment, the invention can be extended to three- 
dimensional worlds. The notions of navigation cues and eavesdropping are the same. 
5 However, current 3D technologies still require the computing power of a high-end PC 
and, at the same time, currently only offer primitive user interfaces that are hard to 
navigate. 

Although the invention has been described in terms of a preferred and several 
alternate embodiments, those skilled in the art will appreciate that other alterations 
10 and modifications can be made without departing from the sphere and scope of the 
teachings of the invention. All such alterations and modifications are intended to be 
within the sphere and scope of the claims appended hereto. 
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WE CLAIM: 



1 . A system for conducting a virtual audio-visual conference between two or 
5 more users comprising: 

a) two or more client stations each acting as a signal source and 

destination for each respective said user, having a user interface for 
audio-visual input and output including audio signal reception and 
generation means for receiving and generating audio signals; 

1 0 b) one or more servers; and 

c) a network coupling said client stations and said servers; 

wherein each said user is represented as a corresponding movable 
visual symbol displayed on said user interfaces of all coupled said 
client stations and said audio signal of all said users is generated at 

1 5 each said client station with an attenuation according to the spatial 

position of respective said symbols on said user interfaces. 

2. The system of claim 1 wherein said attenuation of said audio signal is 
determined uniformly from the said spatial position of each said signal source. 

20 

3. The system of claim 2 wherein said attenuation is determined according to the 
formula: 

i 

M(x*-x 5 y +(y s -y s ) 2 ) + \ 
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• 



where (x s y s ) is the spatial position of said signal source and (x^ is the spatial 
position of said signal destination and X is the parameter on how fast the signal 
decays. 

4. The system of claim 1 wherein said attenuation of said audio signal is 
determined based upon the direction in which said movable visual symbol of each 
said signal source is oriented on said user interface. 

5. The system of claim 4 wherein said attenuation is further approximated by an 
ellipse defined by an origin of said sound source at a point a which coincides with a 
focus of said ellipse, a forward range max, a backward range min and said direction in 
which said movable visual symbol of each said signal source is oriented is the unit 
vector ^ as measured from the (x,y) axis. 

6. The system of claim 5 wherein said attenuation is further determined 
according to the formula: 



l + (AT a -l) 



(6-aXft-fl) 



r(max o ,min a ,;r-<0) 2 



where: 



b is said sound destination point, 



u a forms an angle 0 with the vector (-1,0), 

N a is the parameter on how fast the signal decays, 



co = arccos 



(b-a)u A 



and 



2 max min 
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7. A method for generating a spatial audio signal in a virtual conference 
presented on an audio-visual device comprising the steps of : 

a) locating the position of a sound generating participant in said virtual 
conference; 

b) locating the position of a listening participant in said virtual 
conference; 

c) calculating the signal strength of said signal received from said 
generating participant at the position of said listening participant based 
upon the distance between said sound generating participant and said 
listening participant; and 

d) generating an output signal corresponding to said calculated signal 
strength. 

8. The method of claim 7 wherein said calculated signal strength is determined 
with a uniform attentuation from said position of said sound generating participant. 

9. The method of claim 8 wherein said attenuation is determined according to the 
formula: 

1 

where (x s y s ) is said position of said sound generating participant and (x R y^ is said 
position of said listening participant and X is the parameter on how fast said signal 
strength decays. 
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1 0. The method of claim 7 wherein said calculated signal strength is determined 
based upon the direction in which said sound generating participant is oriented. 

5 11. The method of claim 1 0 wherein said attenuation is further approximated by 
an ellipse defined by an origin of said sound generating participant at a point a which 
coincides with a focus of said ellipse, a forward range max, a backward range min and 
said direction in which said sound generating participant is oriented is the unit vector 
Ua as measured from the (x,y) axis. 
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1 2. The system of claim 1 1 wherein said attenuation is further determined 



according to the formula: 



1 + 



r(max fl ,min a ,;r-<y) 2 
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where: 



b is said position of said sound generating participant, 



u a forms an angle 0 with the vector (-1,0), 
20 N a is the parameter on how fast the signal decays, 



co = arccos 



(b-a)u A 



V(& 



and 



r (max, min, <f>) = 



2 max min 



max+ min+ (max- min) cos 0 
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Amendments to the claims have been filed as follows 

1 . A system for conducting a virtual audio-visual conference between two or 
more users comprising: 

a) two or more client stations each acting as a signal source and destination for 
each respective said user, having a user interface for audio-visual input and output 
including audio signal reception and generation means for receiving and generating 
audio signals; 

b) one or more servers; and 

c) a network coupling said client stations and said servers; wherein each said use 
is represented as a corresponding movable visual symbol displayed on said user 
interfaces of all coupled said client stations and said audio signal of all said users is 
generated at each said client station with an attenuation, according to the spatial 
position of respective said symbols on said user interfaces and according to the 
direction in which each movable visual symbol of each signal source is oriented on 
the user interface. 



2. The system of claim 1 wherein said attenuation of said audio signal is 
determined uniformly from the said spatial position on each said signal source. 

3 . The system of claim 2 wherein said attenuation is determined according to the 
formula: 

i 

M(x K -x 5 ) 2 +(y R -y s y) + \ 
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where (X S ,Y S ) is the spatial position of said signal source and (X R ,Yn) is the spatial 
position of said signal destination and X is the parameter of how fast the signal 
decays. 

4. The system of any preceding claim wherein said attenuation is further 
approximated by an ellipse defined by an original of said sound source at a point a 
which coincides with a focus of said ellipse, a forward range max, a backward range 
min and said direction in which said movable visual symbol of each said signal source 
is oriented is the unit vector u a as measured from the (x,y) axis. 

5. The system of claim 4 wherein said attenuation is further determined 

according to the formula: 

1 

(b-a)(b-a) 



r(max 0 ,min tf ,;r-<y) 2 
where: 

b is said sound destination point, 

u, forms an angle $ with the vector (-1,0), 

N a is the parameter on how fast the signal decays, 

co = arccos , , 
4{b-d){b-a) 

and 

. lx 2 max min 

r(max,mm,(2>) = 



max+ min+ (max- min) cos 0 



6. A method of conducting a virtual audio-visual conference between two or 




more users, each user having a user interface for audio-visual input and output 
including audio signal reception and generation means for receiving and generating 
audio signals, said method comprising the steps of: 

a) representing each user as a movable symbol displayed on said user 
interface; 

b) locating the position of a sound generating participant in said virtual 
conference; 

c) locating the position of a listening participant in said virtual 
conference; 

d) calculating the signal strength of said signal received from said 
generating participant at the position of said listening participant based 
upon the distance between said sound generating participant and said 
listening participant and upon the direction in which the sound 
generating participant is oriented, and 

e) generating an output signal corresponding to said calculated signal 
strength. 

7. The method of claim 6 wherein said calculated signal strength is determined 
with a uniform attenuation from said position of said sound generating participant. 



8. The method of claim 7 wherein said attenuation is determined according to the 
formula: 



1 



where (X S ,Y S ) is said position of said sound generating participant and (X r ,Yr) is said 
position of said listening participant and X is the parameter on how fast said signal 
strength decays. 



9. The method of any one of claims 6 to 9 wherein said attenuation is further 
approximated by an ellipse defined by an origin of said sound generating participant 
at a point a which coincides with a focus of said ellipse, a forward range max, a 
backward range min and said direction in which said sound generating participant is 
oriented is the unit vector u, as measured from the (x,y) axis. 

1 0. The system of claim 9 wherein said attenuation is further determined 

according to the formula: 
1 



1 + n (*-«X»-«) 

r(max a9 mm o9 x-a>y 



where: 

b is said position of said sound generating participant, 

u, forms an angle ^ with the vector (-1,0), 

N, is the parameter on how fast the signal decays, 



(b-a)u A 
g> = arccos 



V(6-<z)(6-ar 
and 

2 max min 

r(max, min,^>) = 



max+ min+ (max- min) cos <f> 




11. A method for generating a spatial audio signal in a virtual conference 
presented on an audio-visual device comprising the steps of: 

a) locating the position of a sound generating participant in said virtual 
conference; 

b) locating the position of a listening participant in said virtual conference; 

c) calculating the signal strength of said signal received from said generating 
participant at the position of said listening participant based upon the distance 
between said sound generating participant and said listening participant, and 

d) . generating an output signal corresponding to said calculated signal strength. 

12. The method of claim 1 1 wherein said calculated signal strength is determined 
with a uniform attenuation from said position of said sound generating participant. 

1 3. The method claim 12 wherein said attenuation is determined according to the 
formula: 

1 

where (X S ,Y S ) is said position of said sound generating participant and (X r ,Yr) is said 
position of said listening participant and X is the parameter on how fast said signal 
strength decays. 

14. The method of claim 1 1 wherein said calculated signal strength is determined 
based upon the direction in which said sound generating participant is oriented. 
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15. The method of claim 14 wherein said attenuation is further approximated 
by an ellipse defined by an origin of said sound generating participant at a point a 
which coincides with a focus of said ellipse, a forward range max, a backward 
range min and said direction in which said sound participant is oriented is the 
unit vector u a as mesured from the (x,y) axis. 

16. The system of claim 15 wherein said attenuation is further determined 
according to the formula: 

1 

where: 

b is said position of said sound generating participant, 

u, forms an angle # with the vector (-1 ,0), 

N, is the parameter on how fast the signal decays, 

{b-d)u A 

& = arccos 



and 

2 max min 

r(max, min, <f>) = : — : rr r . 

v max+ min+ (max- min) cosp 
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